Variable Selection for Model-Based Clustering
نویسندگان
چکیده
We consider the problem of variable or feature selection for model-based clustering. We recast the problem of comparing two nested subsets of variables as a model comparison problem, and address it using approximate Bayes factors. We develop a greedy search algorithm for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples, and found that removing irrelevant variables often improved performance. Compared to methods based on all the variables, our variable selection method consistently yielded more accurate estimates of the number of clusters, and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.
منابع مشابه
Multivariate Estimation of Rock Mass Characteristics Respect to Depth Using ANFIS Based Subtractive Clustering- Khorramabad- Polezal Freeway Tunnels
Combination of Adoptive Network based Fuzzy Inference System (ANFIS) and subtractive clustering (SC) has been used for estimation of deformation modulus (Em) and rock mass strength (UCSm) considering depth of measurement. To do this, learning of the ANFIS based subtractive clustering (ANFISBSC) was performed firstly on 125 measurements of 9 variables such as rock mass strength (UCSm), deformati...
متن کاملSteel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملOn Model-Based Clustering, Classification, and Discriminant Analysis
The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...
متن کاملVariable selection for clustering with Gaussian mixture models: state of the art
The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the model, making essential the selection of relevant variables for this type of clustering. After recalling the basics of clustering based on a model, this art...
متن کاملVariable selection in model-based clustering: A general variable role modeling
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent o...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006